Initial check-in of changes to llms() API improving model name resolution by datancoffee · Pull Request #74 · tower/tower-cli

datancoffee · 2025-07-27T16:06:25Z

The goals of this PR was to allow developers to use LLMs across their development and production environments just by specifying a short model family name and without having lots of conditional statements handling differences in environments.

E.g. calling llms("llama3.2") should resolve to the model "llama3.2:3b" when working with ollama local inference and to model "meta-llama/Llama-3.2-3B-Instruct" when using Hugging Face Hub serverless inference.

Differences in naming of models, and the fact that each model is actually a collection of dozen of versions that differ in number of parameters (distillation) and quantization techniques made the naming resolution a tricky task.

Here are the new capabilities of the llms() API:

Tower now recognizes ~170 names of model families as of August 2025. Btw, there are 100,000+ individual models in these model families
Users can specify a model family e.g. llms("deepseek-r1") in both local and Tower cloud environments, and Tower will resolve the model family to a particular model that is available for inference:

in local environment, Tower will find the model that is installed and, if there are multiple installed, pick the one with the largest number of parameters
in Tower cloud environments, Tower will take the first model returned by HF search, making sure that this model is servable by the Inference Service, if specified by users

In addition to using model family names, Users can also specify a particular model both in local and Tower cloud environments:

locally: llms("deepseek-r1:14b") or llms("llama3.2:latest")
serverlessly: llms("deepseek-ai/DeepSeek-R1-0528") or llms("meta-llama/Llama-3.2-3B-Instruct")

Expected use:

A developer wants to use use a model of the "llama3.2" family first in development and then in production. They would add this code to their Tower app:

model_name=os.getenv("MODEL_NAME")
llm = llms(model_name)

They would set up their environments as follows:

"dev" environment:

Tower Secrets:

MODEL_NAME = "llama3.2"
(any model of this family installed locally will do)

TOWER_INFERENCE_ROUTER=ollama

"prod" environment:

Tower Secrets:

MODEL_NAME = meta-llama/Llama-3.2-3B-Instruct
(use a particular model)

TOWER_INFERENCE_ROUTER=hugging_face_hub
TOWER_INFERENCE_ROUTER_API_KEY=hf_123456789
TOWER_INFERENCE_SERVICE=<together|...>

…tion The goals of this PR was to allow developers to use LLMs across their development and production environments just by specifying a short model family name and without having lots of conditional statements handling differences in environments. E.g. calling llms("llama3.2") should obtain a reference to the "llama3.2:3b" model when working with ollama local inference and a reference to "meta-llama/Llama-3.2-3B-Instruct" when using Hugging Face Hub serverless inference. Differences in naming of models, and the fact that each model is actually a collection of dozen of versions that differ in number of parameters (distillation) and quantization techniques made the naming resolution a tricky task. Here are the new capabilities of the llms() API: 1. Tower now recognizes ~170 names of model families as of August 2025. 2. Users can specify a model family e.g. llms("deepseek-r1") in both local and Tower cloud environments, and Tower will resolve the model family to a particular model that is available for inference: - in local environment, Tower will find the model that is installed and, if there are multiple installed, pick the one with the largest number of parameters - in Tower cloud environments, Tower will take the first model returned by HF search, making sure that this model is servable by the Inference Service, if specified by users In addition to using model family names, Users can also specify a particular model both in local and Tower cloud environments: locally: llms("deepseek-r1:14b") or llms("llama3.2:latest") serverlessly: llms("deepseek-ai/DeepSeek-R1-0528") or llms("meta-llama/Llama-3.2-3B-Instruct") Expected use: A developer wants to use use a model of the "llama3.2" family first in development and then in production. They would add this code to their Tower app: ``` model_name=os.getenv("MODEL_NAME") llm = llms(model_name) ``` They would set up their environments as follows: "dev" environment: Tower Secrets: MODEL_NAME = "llama3.2" (any model of this family installed locally will do) TOWER_INFERENCE_ROUTER=ollama "prod" environment: Tower Secrets: MODEL_NAME = meta-llama/Llama-3.2-3B-Instruct (use a particular model) TOWER_INFERENCE_ROUTER=hugging_face_hub TOWER_INFERENCE_ROUTER_API_KEY=hf_123456789 TOWER_INFERENCE_SERVICE=<together|...>

…l-llm-feature-that-retrieves-best

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…that-retrieves-best' of https://github.com/tower/tower-cli into feature/tow-501-sdk-implement-experimental-llm-feature-that-retrieves-best

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

bradhe

This is a good start, I see where you're going: You want to provide multiple backends to tower.llms. A few comments on this generally.

It'd be easier/more maintainable to provide multiple backends that the Llm class uses, instead of intermixing the logic in a single class. There should probably be a VllmBackend, a HuggingFaceBackend and an OllamaBackend. They all should have the same interface, just the setup will be slightly different, and Llm should choose which backend it selects based on the TowerContext I suppose?
I don't really want to invent new terminology, is "inference router" what VLLM and HuggingFace call themselves? I'm good as long as we're not inventing it!
HuggingFace can actually do this local routing for us (to vllm too!): https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client. I just stumbled on that while looking up point 2 above :) Should we consider just using that?

tests/tower/test_env.py

src/tower/_llms.py

tests/tower/test_llms.py

src/tower/_llms.py

bradhe · 2025-07-27T18:51:41Z

Also can you do a uv sync? Looks like the uv.lock is out of date!

datancoffee · 2025-07-29T12:00:55Z

2. "inference router" what VLLM and HuggingFace call themselves? I'm good as long as we're not inventing it!

As discussed, I will rename inference_service to inference_provider, because this is terminology used by hugging face.

Inference Router is something I came up with, because we are dealing with services that route inference requests to inference providers.

datancoffee · 2025-07-29T12:20:49Z

Also can you do a uv sync? Looks like the uv.lock is out of date!

done

1. added HF Hub and Ollama as dependencies to pyproject.toml (not a brad comment) 2. bumped HF version to 0.34.3 3. updated pytest.ini.template with settings to find source code 4. in tower context and elsewhere, renamed inference_service to inference_provider 5. in llms, addressed bunch of comments from brad 6. removed test_env.py 7. addressed lots of brad comments re test_llms.py 8. updated uv.lock

Copilot

Pull Request Overview

This PR improves the llms() API to enable seamless model name resolution across development and production environments. The enhancement allows developers to use short model family names (like "llama3.2") that automatically resolve to appropriate provider-specific model names based on the configured inference router.

Adds support for ~170 model families with automatic name resolution
Enables environment-agnostic model specification through inference router configuration
Replaces hardcoded model mappings with dynamic resolution logic

Reviewed Changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/tower/_llms.py	Core implementation of model name resolution logic and updated LLM interface
src/tower/_context.py	Updates context to use new inference router configuration parameters
tests/tower/test_llms.py	Comprehensive test suite for model resolution functionality
tests/tower/test_tables.py	Import path fix for pyiceberg dependency
pyproject.toml	Updated dependencies and versions for AI functionality
pytest.ini.template	Test configuration template with environment variable setup
.vscode/settings.json	VS Code pytest configuration
README.md	Updated testing documentation

Comments suppressed due to low confidence (7)

src/tower/_llms.py:43

The model 'deepseek-ai/DeepSeek-R1-0528' appears to reference a specific date (0528) that may not exist. This could be a placeholder or incorrect model identifier that should be verified against actual available models.

    "deepseek-coder",

src/tower/_llms.py:265

[nitpick] Variable naming is inconsistent with Python conventions. Should use snake_case: 'parameter_size' should be 'parameter_size' (already correct) but the assignment should have spaces around the equals sign for consistency.

            parameter_size=details.get('parameter_size', '')

src/tower/_llms.py:266

[nitpick] Variable assignment lacks spaces around the equals sign, inconsistent with Python style conventions.

            quantization_level=details.get('quantization_level', '')

tests/tower/test_llms.py:162

This test uses a hardcoded model name that may not exist or may become unavailable, making the test brittle. Consider using a mock or a more stable test model.

        llm = llms("deepseek-ai/DeepSeek-R1")

tests/tower/test_llms.py:201

This test makes actual API calls to external services, which could fail due to network issues or model unavailability. Consider mocking the external API calls for more reliable testing.

        llm = llms("deepseek-ai/DeepSeek-R1")

pyproject.toml:43

The version constraint 'huggingface-hub>=0.34.3' may specify a version that doesn't exist, as my knowledge cutoff is January 2025 and this appears to be from July 2025. Please verify this version exists.

  "huggingface-hub>=0.34.3",

pyproject.toml:44

The version constraint 'ollama>=0.4.7' may specify a version that doesn't exist, as my knowledge cutoff is January 2025 and this appears to be from July 2025. Please verify this version exists.

  "ollama>=0.4.7",

Copilot · 2025-07-30T16:29:07Z

src/tower/_llms.py

-def infer_with_ollama(ctx: TowerContext, model: str, messages: list, is_retry: bool = False) -> str:
+def complete_chat_with_ollama(ctx: TowerContext, model: str, messages: list, is_retry: bool = False) -> str:
+
+    # TODO: remove the try/except and don't pull the model if it doesn't exist. sso 7/20/25


The TODO comment references a future date (7/20/25) and mentions removing important error handling logic. This creates technical debt and the automatic model pulling behavior should be properly documented or addressed rather than left as a TODO.

bradhe

Disable the failing tests and you can land this. We need a green build, because we need to cut a release such that we can build you a binary, etc., to install.

datancoffee requested review from bradhe and Copilot July 27, 2025 16:06

This comment was marked as outdated.

Sign in to view

datancoffee and others added 4 commits July 27, 2025 18:15

Merge branch 'develop' into feature/tow-501-sdk-implement-experimenta…

03a0a25

…l-llm-feature-that-retrieves-best

Update src/tower/_llms.py

a3fd69d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply recos from copilot review

a6d414c

Merge branch 'feature/tow-501-sdk-implement-experimental-llm-feature-…

b1e83d4

…that-retrieves-best' of https://github.com/tower/tower-cli into feature/tow-501-sdk-implement-experimental-llm-feature-that-retrieves-best

datancoffee requested a review from Copilot July 27, 2025 16:39

This comment was marked as outdated.

Sign in to view

datancoffee and others added 2 commits July 27, 2025 18:55

Added Cursor settings.json for Pytest

04b7a51

Update src/tower/_llms.py

9a6550d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

bradhe reviewed Jul 27, 2025

View reviewed changes

datancoffee requested a review from Copilot July 30, 2025 16:28

Copilot AI reviewed Jul 30, 2025

View reviewed changes

bradhe approved these changes Jul 30, 2025

View reviewed changes

datancoffee and others added 3 commits July 31, 2025 15:48

Marking pytests as skipped because they are not runnable in GH actions

0970e5c

chore: Try to resolve build issue in Linux

8a2f83b

chore: One more missing dependency

88ad352

datancoffee merged commit 0929abe into develop Jul 31, 2025
18 checks passed

datancoffee deleted the feature/tow-501-sdk-implement-experimental-llm-feature-that-retrieves-best branch July 31, 2025 17:30

Conversation

datancoffee commented Jul 27, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

bradhe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bradhe commented Jul 27, 2025

Uh oh!

datancoffee commented Jul 29, 2025

Uh oh!

datancoffee commented Jul 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

bradhe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants